Creating videos with facial expressions

ABSTRACT

The present disclosure relates to creating videos. A mobile device creates a graphic user interface to capture by the camera of the device multiple photographic facial images of a user for respective multiple facial expressions of a character in the video. Using the multiple photographic facial images, the device modifies stored character images by matching facial features of the character to facial features of the user for the multiple facial expressions of the character in the video and creates the video based on the modified character images. The facial expression of the user is used to influence the facial expression of the character. This method enables replacement of certain visual style elements with a given user&#39;s own style elements.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation application from U.S.application Ser. No. 16/320,966, filed Jan. 25, 2019, which is thenational stage filing under U.S.C. 371 of International Application No.PCT/AU2017/050763, filed Jul. 25, 2017, which claims priority from U.S.provisional application 62/366,375 filed on 25 Jul. 2016 the content ofwhich is incorporated herein by reference. The present applicationfurther claims priority from U.S. provisional application 62/366,406filed on 25 Jul. 2016 the content of which is incorporated herein byreference. The present application further claims priority fromAustralian provisional application 2016902919 filed on 25 Jul. 2016 thecontent of which is incorporated herein by reference. The presentapplication further claims priority from Australian provisionalapplication 2016902921 filed on 25 Jul. 2016 the content of which isincorporated herein by reference.

TECHNICAL FIELD

The present disclosure generally relates to creating videos. The presentdisclosure includes computer-implemented methods, software, and computersystems for creating videos with facial expressions to reflect styles ofindividual persons.

BACKGROUND

A video document is often used to present content in relation to a“story”. The content typically consists of audio and/or visual content,or both visual and audio content, for example, the video documentsavailable at Youtube. The content presented in the video document ofteninvolves at least one character and a storyline associated with thecharacter. The storyline is used to represent how the story developswith respect to the character over time, including what the characterdoes and the interactions of the character with other characters in thestory.

Throughout this specification the word “comprise”, or variations such as“comprises” or “comprising”, will be understood to imply the inclusionof a stated element, integer or step, or group of elements, integers orsteps, but not the exclusion of any other element, integer or step, orgroup of elements, integers or steps.

Any discussion of documents, acts, materials, devices, articles or thelike which has been included in the present disclosure is not to betaken as an admission that any or all of these matters form part of theprior art base or were common general knowledge in the field relevant tothe present disclosure as it existed before the priority date of eachclaim of this application.

SUMMARY

There is provided a method for creating a video on a mobile device thatcomprises a camera, the method comprising:

creating a graphic user interface on the mobile device to capture by thecamera multiple photographic facial images of a user for respectivemultiple facial expressions of a character in the video;

using the multiple photographic facial images to modify stored characterimages by matching facial features of the character to facial featuresof the user for the multiple facial expressions of the character in thevideo; and

creating the video based on the modified character images.

There is provided a method for creating a video including a character ona mobile device that comprises a camera, the method comprising:

(a) creating a graphic user interface on the mobile device to capture bythe camera multiple photographic facial images of a user for respectivemultiple facial expressions;

(b) extracting a user facial feature from each of the multiplephotographic facial images;

(c) storing associated with a respective facial expression identifierthe user facial feature from each of the multiple photographic facialimages;

(d) selecting one of the multiple user facial features based on a firstfacial expression identifier associated with a first frame of the video;

(e) determining a transformation that transforms a reference facialfeature associated with the first facial expression identifier into anapproximation or representation of the selected one of the multiple userfacial features;

(f) modifying, based on the transformation, a reference facial image ofthe character associated with the first facial expression identifier andthe reference facial feature; and

(g) creating the first frame of the video based on the modifiedreference facial image.

As can be seen from the above, the first frame of the video is createdby modifying the reference facial image of the character with referenceto the corresponding user facial feature. Therefore, the originalcharacter's visual style is not replaced by a given user's visual style.Instead, the facial expression of the user is used to influence thefacial expression of the character. This method enables replacement ofcertain visual style elements with a given user's own style elements.Although this method is described with reference to facial expressions,the method is also applicable to skin tone, eye colour, etc.

The method may further comprise:

for a second facial expression identifier associated with a second frameof the video, repeating steps (d) to (g) to create the second frame ofthe video.

The user facial feature may comprise a set of control points.

The graphic user interface may comprise the reference facial image ofthe character.

The graphic user interface may comprise a live view of each of themultiple photographic facial images.

The live view may be positioned next to the camera.

The live view may be positioned next to the reference facial image ofthe character.

The method may further comprise superimposing the live view on thereference facial image of the character.

The method may further comprise selecting the character from a pluralityof characters in the video.

The method may further comprise recording audio data associated with theuser facial feature.

There is provided a computer software product, includingmachine-readable instructions, when executed by a processor of a mobiledevice, causes the processor to perform any one of the methods describedabove.

There is provided a mobile device for creating a video including acharacter, the mobile device comprising:

a camera;

a display; and

a processor, the processor configured to

-   -   (a) create a graphic user interface on the display of the mobile        device to capture by the camera multiple photographic facial        images of a user for respective multiple facial expressions;    -   (b) extract a user facial feature from each of the multiple        photographic facial images;    -   (c) store associated with a respective facial expression        identifier the user facial feature from each of the multiple        photographic facial images;    -   (d) select one of the multiple user facial features based on a        first facial expression identifier associated with a first frame        of the video;    -   (e) determine a transformation that transforms a reference        facial feature associated with the first facial expression        identifier into an approximation or representation of the        selected one of the multiple user facial features;    -   (f) modify, based on the transformation, a reference facial        image of the character associated with the first facial        expression identifier and the reference facial feature;    -   (g) create the first frame of the video based on the modified        reference facial image; and    -   (h) present, on the display, the first frame of the video in the        graphic user interface.

There is provided a method for creating an output frame for a characterin a video, the method comprising:

determining an estimated reference facial feature of the character basedon a first reference facial image and a second reference facial image ofthe character;

determining an estimated user facial feature of a user based on a firstphotographic facial image and a second photographic facial image of theuser;

determining a transformation that transforms the estimated referencefacial feature of the character into an approximation or representationof the estimated user facial feature of the user;

modifying, based on the transformation, a third reference facial imageof the character associated with the estimated reference facial featureof the character; and

creating the output frame for the character in the video based on themodified third reference facial image.

As can be seen from the above, this method determines the estimatedreference facial feature of the character and the estimated user facialfeature of the user, and determines the transformation based on theestimated reference facial feature of the character and the estimateduser facial feature of the user. This dramatically reduces the timerequired to create the output frame.

Determining the estimated reference facial feature of the character maycomprise:

determining a first distance between a first reference facial feature ofthe first reference facial image of the character and a second referencefacial feature of the second reference facial image of the character;and

determining the estimated reference facial feature of the characterbased on the first distance, the first reference facial feature and thesecond reference facial feature.

Determining the estimated reference facial feature of the character maycomprise performing an interpolation operation based on the firstreference facial feature and the second reference facial feature withrespect to the first distance.

Determining the estimated reference facial feature of the character maycomprise performing an extrapolation operation based on the firstreference facial feature and the second reference facial feature withrespect to the first distance.

The first reference facial feature may include a first set of controlpoints, and the second reference facial feature may include a second setof control points, and the first distance may be indicative of adistance between the first set of control points and the second set ofcontrol points.

Determining the estimated user facial feature of the user may comprise:

determining a second distance between a user first facial feature of thefirst photographic facial image of the user and a user second facialfeature of the second photographic facial image of the user; and

determining the estimated user facial feature based on the seconddistance, the user first facial feature and the user second facialfeature.

Determining the estimated user facial feature of the user may compriseperforming an interpolation operation based on the user first facialfeature and the user second facial feature with respect to the seconddistance.

Determining the estimated user facial feature of the user may compriseperforming an extrapolation operation based on the user first facialfeature and the user second facial feature with respect to the seconddistance.

The user first facial feature may include a third set of control points,and the user second facial feature may include a fourth set of controlpoints, and the second distance may be indicative of a distance betweenthe third set of control points and the fourth set of control points.

Modifying the third reference facial image of the character may comprisetransforming a first spline curve represented by the estimated referencefacial feature of the character into an approximation or representationof a second spline curve represented by the estimated user facialfeature of the user.

There is provided a computer software product, includingmachine-readable instructions, when executed by a processor of a mobiledevice, causes the processor to perform any one of the methods describedabove.

There is provided a mobile device for creating an output frame for acharacter in a video, the mobile device comprising:

a camera to capture a first photographic facial image and a secondphotographic facial image of the user;

a display; and

a processor, the processor configured to

-   -   determine an estimated reference facial feature of the character        based on a first reference facial image and a second reference        facial image of the character;    -   determine an estimated user facial feature of the user based on        the first photographic facial image and the second photographic        facial image of the user;    -   determine a transformation that transforms the estimated        reference facial feature of the character into an approximation        or representation of the estimated user facial feature of the        user;    -   modify, based on the transformation, a third reference facial        image of the character associated with the estimated reference        facial feature of the character;    -   create the output frame based on the modified third reference        facial image; and    -   present the output frame on the display.

BRIEF DESCRIPTION OF DRAWINGS

Features of the present disclosure are illustrated by way ofnon-limiting examples, and like numerals indicate like elements, inwhich:

FIG. 1 illustrates an example mobile device for creating a videoincluding a character in accordance with the present disclosure;

FIGS. 2A and 2B illustrate example methods for creating a videoincluding a character on the mobile device in accordance with thepresent disclosure;

FIGS. 3A, 3B, 3C, 3D, 3E, 3F illustrate graphic user interfaces inaccordance with the present disclosure;

FIGS. 4A, 4B and 5 illustrate facial features in accordance with thepresent disclosure;

FIG. 6 illustrates a detailed process for creating a video including acharacter on the mobile device in accordance with the presentdisclosure;

FIG. 7 illustrate an example mobile device for creating an output framefor a character in a video in accordance with the present disclosure;

FIG. 8 illustrates an example method for creating an output frame for acharacter in a video in accordance with the present disclosure;

FIG. 9 illustrates an interpolation process; and

FIG. 10A shows a transformation of the 2D coordinates of the controlpoints to create the impression of a 3D rotation of the character'sface, and FIG. 10B shows a simplified 3D model of a character's head.

DESCRIPTION OF EMBODIMENTS

A video in the present disclosure consists of a sequence of images,i.e., “frames”. Each frame differs in content from its adjacent frames(i.e., previous and next frames) by a small amount in terms ofappearance. By displaying the sequence of frames at a high rate (e.g. 30frames per second), a viewer of the sequence is given the impression ofviewing a “movie clip”.

A frame of the video includes at least two “layers” of visual content.One or more layers represent the non-replaceable content. One or morelayers represent replaceable characters. A replaceable character may bereplaced with user-supplied content according to the method(s) asdescribed in the present disclosure. All layers are composited togetherin order to produce a processed frame, or an output frame, associatedwith the frame.

In addition to the visual image frame sequence, the video may alsoinclude one or more audio tracks. Typically, all but the replaceablecharacter audio content occupies one single audio track. Additionalaudio tracks are used to store audio content for each replaceablecharacter. This per-character content is then further subdivided intoindividual elements, each representing a “sound bite” (e.g. a shortvoiceover speech element, or a noise element) for that character in thatspecific story.

In the present disclosure, an original video document contains onlyoriginal, or “reference”, material. This includes replaceable andnon-replaceable reference content. The replaceable reference contentconsists of some or all of the graphical elements for each replaceablecharacter, saved on a frame-by-frame basis. At a minimum, this contentconsists of the replaceable character's head or face as it appears ineach frame of the reference video content. Replaceable reference contentmay also include elements such as hands, feet, etc. where it may bedesirable to offer the users a selectable set of display options (e.g.skin colour).

The non-replaceable visual content may consist of graphical assets,arranged as sets of assets on a per-frame basis in an animation sequencethat are normally used to generate video content, but with allreplaceable content removed. This form of non-replaceable visual contentis packaged as a number of asset layers per frame which, when combinedwith the associated per-frame replaceable content, forms a completesequence of video frames.

The non-replaceable content may alternatively consist of standard videocontent, with replaceable reference content masked (or removed) fromeach video frame. In this scenario, replaceable character audio contentare extracted from the original video content. In this case, the videois deconstructed on a frame-by-frame basis, either in real time or as aseparate pre-processing stage where the frames are stored in a database.In either case, the deconstructed video frames are then subsequentlycombined with the associated per-frame replaceable content, forming acomplete sequence of video frames.

In the present disclosure, a user provides material for all replaceablecontent (i.e., audio and visual) for a given story. In the case of audiomaterial, the user typically provides their own “sound bite” (voiceover,etc.) for each element in a replaceable character's audio track. In thecase of user-supplied visual material, the user produces a facialexpression identified by a facial expression identifier or mimics theoriginal replaceable character's video sequence, particularly, a facialexpression of the original replaceable character in a key frame at atime instant. The feature of the facial expression of the user isextracted from the user photographic image captured by the camera 101 ofthe mobile device 100. The feature of facial expression of the characterin the key frame is also extracted. The mathematical difference betweenthe character's feature and the user's features is then used to modifythe original character's facial appearance in order to better resemblethe user's facial appearance. This resemblance includes, but is notlimited to, the position and shape of: eyes, eyebrows, nose, mouth, andfacial outline/jawline, as described with reference FIGS. 2(a) and (b),FIGS. 3 to 6.

In another example, the user produces distinctive or representativefacial expressions identified by facial expression identifiers or mimicsdistinctive or representative facial expressions in different key framesat different time instants. The features of the facial expression ofboth the user and the original replaceable character at the differenttime instants are extracted. The method(s) described in the presentdisclosure then dynamically creates a facial image of the character byusing an algorithm for example, interpolation and/or extrapolation basedon these facial expression features, as described with reference toFIGS. 7 and 8.

FIG. 1 illustrates an example mobile device 100 for creating a videoincluding a character in accordance with the present disclosure. Themobile device includes a camera 101, a display 103, and a processor 105.The camera 101, the display 103 and the processor 105 are connected toeach other via a bus 107. The mobile device 100 may also include amicrophone 109, and a memory device 111.

The camera 101 is an optical device that captures photographic images ofthe user of the mobile device 100. The photographic images captured bythe camera 101 are transmitted from the camera 101 to the processor 105for further processing, or to the memory device 111 for storage.

The display 103 in this example is a screen to present visual content tothe user under control of the processor 105. For example, the display103 displays images to the user of the mobile device 100. As describedabove, the images can be those captured by the camera 101, or processedby the processor 105, or retrieved from the memory device 111. Further,the display 103 is able to present a graphic user interface to the user,as shown in FIG. 1. The graphic user interface includes one or more“pages”. Each of the pages includes one or more graphic user interfaceelements, for example, buttons, menus, drop-down list, text boxes,picture boxes, etc. to present visual content to the user or to receivecommands from the user, as shown in FIG. 1, which represents one of thepages included in the graphic user interface.

The display 103 can also be a screen with a touch-sensitive device (notshown in FIG. 1). A virtual keyboard is displayed on the display 103,and the display 103 is able to receive commands through thetouch-sensitive device when the user touches the virtual keys of thevirtual keyboard, as shown in FIG. 3C.

The memory device 111 is a computer-readable medium that stores acomputer software product. The memory device 111 can be part of theprocessor 105, for example, a Random Access Memory (RAM) device, a ReadOnly Memory (ROM) device, a FLASH memory device, which is integratedwith the processor 105.

The memory device 111 can also be a device separate from the processor,for example, a floppy disk, a hard disk, an optical disk, a USB stick.The memory device 111 can be directly connected to the bus 107 byinserting the memory device 111 into an appropriate interface providedby the bus 107. In another example, the memory device 111 is locatedremotely and connected to the bus 107 through a communication network(not shown in FIG. 1). The computer software product stored in thememory device 111 is downloaded, through the communication network, tothe processor 105 for execution.

The computer software product includes machine-readable instructions.The processor 105 of the mobile device 100 loads the computer softwareproduct from the memory device 111 and reads the machine-readableinstructions included in the computer software product. When thesemachine-readable instructions are executed by the processor 105, theseinstructions cause the processor 105 to perform one or more method stepsdescribed below.

FIG. 2A illustrates an example method 200 for creating a video includinga character on the mobile device 100. The method 200 is performed by theprocessor 105 of the mobile device 100. Particularly, the processor 105is configured to

create 201 a graphic user interface on the mobile device 100 to captureby the camera 101 multiple photographic facial images of a user forrespective multiple facial expressions of the character in the video;

use 203 the multiple photographic facial images to modify storedcharacter images by matching facial features of the character to facialfeatures of the user for the multiple facial expressions of thecharacter in the video; and

create 205 the video based on the modified character images.

FIG. 2B illustrates another example method 210 for creating a videoincluding a character on the mobile device 100. The method 210 isperformed by the processor 105 of the mobile device 100. Particularly,the processor 105 is configured to

(a) create 211 a graphic user interface on the mobile device 100(particularly, the display 103 of the mobile 100) to capture by thecamera 101 multiple photographic facial images of a user for respectivemultiple facial expressions;

(b) extract 213 a user facial feature from each of the multiplephotographic facial images;

(c) store 215 associated with a respective facial expression identifierthe user facial feature from each of the multiple photographic facialimages;

(d) select 217 one of the multiple user facial features based on a firstfacial expression identifier associated with a first frame of the video;

(e) determine 219 a transformation that transforms a reference facialfeature associated with the first facial expression identifier into anapproximation or representation of the selected one of the multiple userfacial features;

(f) modify 221, based on the transformation, a reference facial image ofthe character associated with the first facial expression identifier andthe reference facial feature; and

(g) create 223 the first frame of the video based on the modifiedreference facial image.

The processor 105 is also configured to present, on the display 103, thefirst frame of the video in the graphic user interface.

For a second facial expression identifier associated with a second frameof the video, the processor 105 repeats steps (d) to (g) to create thesecond frame of the video.

As can be seen from the above, the first frame of the video is createdby modifying the reference facial image of the character with referenceto the corresponding user facial feature. Therefore, the originalcharacter's visual style is not replaced by a given user's visual style.Instead, the facial expression of the user is used to influence thefacial expression of the character. This method enables replacement ofcertain visual style elements with a given user's own style elements.Although this method is described with reference to facial expressions,the method is also applicable to skin tone, eye colour, etc.

In the case of skin tone, for example, multiple sets of identicalreplaceable reference character content are supplied with the referencematerial package for a given story, with each set differing only in skintone. In that way, the user alters the reference character's skin toneto mimic their own simply by selecting from a set of alternate skin toneoptions. The selected set of reference character content is subjected tothe similar feature transformation as described above, which creates acharacter that is more similar in shape and colour to the user's ownappearance.

The content generated by the method(s) described in the presentdisclosure is significantly personalised for each user, and it isconstructed “on demand” in real time from sets of associated assetelements. The resulting content (a sequence of frames) can then beimmediately displayed on a device. Alternatively, the generated contentmay be used to produce a final multimedia asset such as a static,viewable Youtube asset.

FIGS. 3A to 3F illustrate the graphic user interface in accordance withthe present disclosure.

The processor 103 creates 211 a graphic user interface on the mobiledevice 100 to capture by the camera 101 multiple photographic facialimages of a user for respective multiple facial expressions.

The graphic user interface starts with the page in FIG. 3A, whichpresents on the display 103 a movie library consisting of one or moremovies. As shown in page (a), there are multiple movies available forthe user to choose to work on, for example, “Kong Fu Panda”, “FastFriends”, “Frozen”, etc. The user chooses “Fast Friends”, and thegraphic user interface proceeds to the page in FIG. 3B. The page In FIG.3B shows a plurality of characters in this movie, for example, a boy, aturtle, and a worm. The user can select one of the characters bytouching the character. The user can also select one of the charactersby entering the name of the character through a virtual keyboardpresented in the graphic user interface, as shown in the page in FIG.3C. In the example shown in the page in Fig. C, the character of the boyis selected by the user. Upon selection of the character, the graphicuser interface proceeds to the page in FIG. 3D.

The page in FIG. 3D shows a list of facial expression identifiers toidentify facial expressions. The facial expression identifiers serve thepurpose of guiding the user to produce facial expressions identified bythe facial expression identifiers. A facial expression identifier can bea text string indicative of the name of a facial expression, forexample, “Smile”, “Frown”, “Gaze”, “Surprise”, and “Grave”, as shown inthe page in FIG. 3D. The facial expression identifier can include anicon, for example, the icon for facial expression “Gaze”. The facialexpression identifier can also include a reference facial image of thecharacter extracted from movie, for example, the facial image of thecharacter in a frame of the movie where the character is “surprised”,which makes it easier for the user to produce the corresponding facialexpression. The facial expression identifier can take other formswithout departing from the scope of the present disclosure.

As shown in the page in FIG. 3D, the user is producing a facialexpression identified by a text string “Surprise” with a referencefacial image of the character. The user recognises the facial expressionidentifier and produces the corresponding facial expression. The facialimage of the user is captured by the camera 101 and presented in a liveview of the graphic user interface. The live view of the user's facialimage is positioned next to the camera 101 to alleviate the issue wherethe user does not appear to look at the camera 101 when the user islooking at the live view. The processor 105 also displays the referencefacial image of the character in a character view of the graphic userinterface. The live view is also positioned next to the reference facialimage of the character to make it easier for the user to compare thefacial expression of the user and the facial expression of thecharacter. In another example, the processor 105 further superimposesthe live view of the facial image of the user on the reference facialimage of the character to make it even easier for the user to comparethe facial expression of the user and the facial expression of thecharacter.

If the user, or another person (for example, a director), is satisfiedwith the facial expression of the user, the user or the other personclicks on the shutter button of the graphic user interface to capturethe photographic facial image of the user. The photographic facial imageof the user can be displayed in a picture box associated with the facialexpression identifier. For example, the photographic facial images ofthe user for facial expressions “Smile” and “Frown” are displayed inrespective picture boxes, as shown in the page in FIG. 3D.

In another example, instead of taking photos of the user, the processor105 may retrieve photographic facial images of the user that have beenstored in the memory device 111 and associate the photographic facialimages with the corresponding facial expression identifiers.

The photographic facial image of the user is transmitted from the camera101 to the processor 105. The processor 105 extracts 213 a user facialfeature “U4” from the photographic facial image corresponding to thefacial expression identifier “Surprise”. The processor 105 stores 215 ina user feature table associated with the facial expression identifier“Surprise” the user facial feature “U4”, as shown in the fourth entry ofthe user feature table below.

User Feature Table Expression ID User Facial Feature User Sound Smile U1S1 Frown U2 S2 Gaze U3 S3 Surprise U4 S4 Grave U5 S5

On the page in FIG. 3E of the graphic user interface, the processor 105records, through the microphone 109, audio data “S4” associated with theuser facial feature “U4”. The processor 105 further stores the audiodata “S4” in the user feature table in association with the facialexpression identifier “Surprise”, as shown in the fourth entry of theuser feature table below.

The processor 105 repeats the above steps for each expression identifierin page (d), and populates the user feature table for the character ofthe boy, which associates the facial expression identifiers with thecorresponding user facial features and audio data. For other charactersin the movie, the processor 105 can similarly generates respective userfeature table for those characters.

FIGS. 4 and 5 illustrate facial features in accordance with the presentdisclosure.

Facial features in the present disclosure include a set of controlpoints. FIG. 4A represents a facial image of an object, which iscaptured by a camera. The object in the present disclosure can be a useror a character in a movie. The facial image in FIG. 4A shows the objectto be generally front-facing such that all key areas of the face arevisible: [both] eyes, [both] eyebrows, nose, mouth, and jawline.Ideally, these areas should be largely unobstructed.

The dots in FIG. 4B represents a set of control points extracted by theprocessor 105. A third party software library is used to extract the setof points from the facial image shown in FIG. 4A. There are a number ofpublic domain libraries available for this purpose, some of which arebased on the open source “OpenCV” library. The set of control pointsthat are extracted from the facial image may comply with an industrystandard, for example, MPEG-4, ISO/IEC 14496-1, 14496-2, etc. Forexample, the control points shown in FIG. 5 comply with the MPEG-4standard. The facial shape of the object may be reconstructed byconnecting those controls with segments. In another example, the facialshape may also be reconstructed by using one or more spline curves thatare based on those control points.

FIG. 6 illustrates a detailed process 400 for creating a video includinga character on the mobile device 100 in accordance with the presentdisclosure.

For description purposes, a storyline is shown in FIG. 6 to indicate asequence of facial expression identifiers of the character of the boyover time. Particularly, there are five facial expression identifierslabelled along the storyline at five time instants “A” to “E”, which are“Smile”, “Gaze”, “Frown”, “Grave”, and “Smile”. These facial expressionidentifiers indicate the facial expressions of the character in theframes at the five time instants. The processor 105 also extracts framesat the five time instants from the video document of the movie “FastFriends”. The frame at time instant “A” contains a facial image of thecharacter that corresponds to the facial expression identified by thefacial expression identifier “Smile”. The facial image of the characterat time instant “A” is also shown in FIG. 6 for description purposes.The processor 105 extracts a facial expression feature “R1” from thefacial image of the character as a reference facial feature associatedwith the facial expression identifier “Smile”. The facial image of thecharacter at time instant “A” is used as a reference facial imageassociated with the facial expression identifier “Smile” and thereference facial feature “R1”.

Referring to the user feature table, the processor 105 selects 217 oneof the multiple user facial features based on the facial expressionidentifier “Smile” associated with the frame at time instant “A” in thevideo. In this example, the processor 105 selects a user facial feature“U1” since the user facial feature “U1” is associated with the facialexpression identifier “Smile” in the user feature table. The processor105 may further select audio data “S1” associated with the facialexpression identifier “Smile”.

The processor 105 determines 219 a transformation that transforms thereference facial feature “R1” associated with the facial expressionidentifier “Smile” into an approximation or representation of theselected user facial feature “U1”. For example, the transformation canbe a transformation matrix that transforms the control points of thereference facial feature “R1” into an approximation or representation ofthe control points of the selected user facial feature “U1”.

The processor 105 modifies 221, based on the transformation, thereference facial image associated with the facial expression identifier“Smile” and the reference facial feature “R1”. Particularly, theprocessor 105 may modify the reference facial image by changing thepositions of pixels in the reference facial image based on thetransformation. The processor 105 then creates 223 the frame at timeinstant “A” of the video based on the modified reference facial image byfor example combining the modified reference facial image and theselected audio data “S1” associated with the facial expressionidentifier “Smile”.

While the user-recorded audio data may be associated with a facialexpression, the audio data may equally be independent from the facialexpressions but otherwise associated with the story line. For example,the user may record audio data for what the character says in aparticular scene where no facial expression identifier is associatedwith frames in that scene. It is noted that the proposed methods andsystems may perform only the disclosed face modification techniques oronly the audio voice-over techniques or both.

The processor 105 repeats the above process for each of the characterscontained in the frame at time instant “A” and/or each of the frames atthe five time instants “A” to “E” along the storyline. As a result, theframes at those time instants in the video contain personal expressionfeatures of the user, and thus the video becomes more personalised anduser-friendly when played, as shown on the page of the graphic useinterface shown in FIG. 3F. It can be seen from the page in FIG. 3F thatthe shape of the face of the character is more like the user's actualface than the original character's face is.

FIG. 7 illustrates an example mobile device 700 for creating an outputframe for a character in a video in accordance with the presentdisclosure. The mobile device 700 includes a camera 701, a display 703,and a processor 705. The camera 701, the display 703 and the processor705 are connected to each other via a bus 707. The mobile device 700 mayalso include a microphone 709, and a memory device 711.

The camera 701 is an optical device that captures photographic images ofthe user of the mobile device 700. The photographic images captured bythe camera 701 are transmitted from the camera 701 to the processor 705for further processing, or to the memory device 711 for storage.

The display 703 in this example is a screen to present visual content tothe user under control of the processor 705. For example, the display703 displays images to the user of the mobile device 700. As describedabove, the images can be those captured by the camera 701, or processedby the processor 705, or retrieved from the memory device 711. Further,the display 703 is able to present a graphic user interface to the user,as shown in FIG. 7.

The memory device 711 is a computer-readable medium that stores acomputer software product. The memory device 711 can be part of theprocessor 705, for example, a Random Access Memory (RAM) device, a ReadOnly Memory (ROM) device, a FLASH memory device, which is integratedwith the processor 105.

The memory device 711 can also be a device separate from the processor,for example, a floppy disk, a hard disk, an optical disk, a USB stick.The memory device 711 can be directly connected to the bus 707 byinserting the memory device 711 into an appropriate interface providedby the bus 707. In another example, the memory device 711 is locatedremotely and connected to the bus 707 through a communication network(not shown in FIG. 7). The computer software product stored in thememory device 711 is downloaded, through the communication network, tothe processor 705 for execution.

The computer software product includes machine-readable instructions.The processor 705 of the mobile device 700 loads the computer softwareproduct from the memory device 711 and reads the machine-readableinstructions included in the computer software product. When thesemachine-readable instructions are executed by the processor 705, theseinstructions cause the processor 705 to perform one or more method stepsdescribed below.

FIG. 8 illustrates an example method 800 for creating an output framefor a character in a video in accordance with the present disclosure.The method 800 is used to create an output frame based on a firstreference facial image and a second reference facial image of thecharacter. The first reference facial image of the character is in afirst key frame of the video, and the second reference facial image ofthe character is in a second key frame of the video. The output framecan be a frame between the first key frame and the second key framealong the storyline, or outside the first key frame and the second keyframe along the storyline. The method 800 is performed by the processor705 of the mobile device 700.

The camera 701 of the mobile device 700 captures a first photographicfacial image and a second photographic facial image of the user, and theprocessor 705 is configured to

determine 810 an estimated reference facial feature of the characterbased on the first reference facial image and the second referencefacial image of the character;

determine 820 an estimated user facial feature of the user based on thefirst photographic facial image and the second photographic facial imageof the user;

determine 830 a transformation that transforms the estimated referencefacial feature of the character into an approximation or representationof the estimated user facial feature of the user;

modify 840, based on the transformation, a third reference facial imageof the character associated with the estimated reference facial featureof the character; and

create 850 the output frame based on the modified third reference facialimage.

The processor 705 is further configured to present the output frame onthe display 103.

As can be seen from the above, the method 800 determines the estimatedreference facial feature of the character and the estimated user facialfeature of the user, and determines the transformation based on theestimated reference facial feature of the character and the estimateduser facial feature of the user. This dramatically reduces the timerequired to create the output frame. A detailed process for creating theoutput frame is described below.

As shown in FIG. 7, two time instants “A”, “B” along the storyline areselected by the user or the director as the facial expressions of thecharacter at these time instants are distinctive or representative. Thefacial expressions of the character at the time instants “A”, “B” areidentified as “Surprise” and “Grave”, respectively. A facial image ofthe character is extracted from the first key frame at time instant “A”,and is referred to as a first reference facial image. A facial image ofthe character is extracted from the second key frame at time instant“B”, and is referred to as a second reference facial image. Bothreference facial images of the character are shown in the graphic userinterface for the user's reference.

The processor 705 determines 810 an estimated reference facial featureof the character based on the first reference facial image and thesecond reference facial image of the character. Particularly, theprocessor 705 extracts a reference facial feature of the character fromthe first reference facial image of the character, referred to as afirst reference facial feature. The processor 705 also extracts areference facial feature of the character from the second referencefacial image of the character, referred to as a second reference facialfeature.

The processor 705 further determines a first distance between the firstreference facial feature of the first reference facial image and thesecond reference facial feature of the second reference facial image.The processor 705 determines the estimated reference facial feature ofthe character based on the first distance, the first reference facialfeature and the second reference facial feature. As described above, thefirst reference facial feature includes a first set of control points,and the second reference facial feature includes a second set of controlpoints. As a result, the first distance is indicative of a distancebetween the first set of control points and the second set of controlpoints.

If the output frame is between the first key frame and the second keyframe, for example, time instant “C” between time instants “A”, “B”, theprocessor 705 determines the estimated reference facial feature of thecharacter by performing an interpolation operation based on the firstreference facial feature and the second reference facial feature withrespect to the first distance.

On the other hand, if the output frame is outside the first key frameand the second key frame, the processor 705 determines the estimatedreference facial feature of the character by performing an extrapolationoperation based on the first reference facial feature and the secondreference facial feature with respect to the first distance.

The user recognises the first facial expression identifier “Surprise”and/or observes the first reference facial image of the character (i.e.,the facial image of the character at time instant “A”), and produces afacial expression that corresponds to the first facial expressionidentifier “Surprise”. If the user or the director is satisfied with thefacial expression of the user, a facial image of the user is captured bythe camera 701, referred to as a first photographic facial image.

Similarly, the user recognises the second facial expression identifier“Grave” and/or observes the second reference facial image of thecharacter (i.e., the facial image of the character at time instant “B”),and produces a facial expression that corresponds to the second facialexpression identifier “Grave”. If the user or the director is satisfiedwith the facial expression of the user, a facial image of the user iscaptured by the camera 701, referred to as a second photographic facialimage.

In another example, instead of taking photos of the user, the processor705 may retrieve photographic facial images of the user that have beenstored in the memory device 711 and associate the photographic facialimages with the corresponding facial expression identifiers.

Both the first photographic facial image and the second photographicfacial image of the user are transmitted from the camera 701 to theprocessor 705.

The processor 705 determines 820 an estimated user facial feature of auser based on the first photographic facial image and the secondphotographic facial image of the user. Particularly, the processor 705extracts a facial feature from the first photographic facial image ofthe user, referred to as a user first facial feature. The processor 705also extracts a facial feature from the second photographic facial imageof the user, referred to as a user second facial feature.

The processor 705 further determines a second distance between the userfirst facial feature and the user second facial feature. The processor705 determines the estimated user facial feature of the user based onthe second distance, the user first facial feature and the user secondfacial feature. As described above, the user first facial featureincludes a third set of control points, and the user second facialfeature includes a fourth set of control points. As a result, the seconddistance is indicative of a distance between the third set of controlpoints and the fourth set of control points.

If the output frame is between the first key frame and the second keyframe, for example, time instant “C” between time instants “A”, “B”, theprocessor 705 determines the estimated user facial feature of the userby performing an interpolation operation based on the user first facialfeature and the user second facial feature with respect to the seconddistance.

On the other hand, if the output frame is outside the first key frameand the second key frame, the processor 705 determines the estimateduser facial feature of the user by performing an extrapolation operationbased on the user first facial feature and the user second facialfeature with respect to the second distance.

FIG. 9 illustrates the interpolation process 900 in more detail. In thisexample, the storyline 901 is annotated with facial expressionidentifiers and FIG. 9 also shows the corresponding control points ofthe facial features. The y-axis 902 indicates the y-position of thecentral control point 903 of the lips. In this example, the storylineevolves from a smile 911 to a frown 912 back to a smile 913 and finallyinto a frown 914 again. Correspondingly, the control point 903 startsfrom a low position 921 into a high position 922, back to a low position923 and finally into a high position 924. For the frames between thesmile 911 and the frown 912, processor 705 may interpolate they-position of control point 903 using a linear interpolation method. Insome examples, however, this may lead to an unnatural appearance at theactual transition points, such as a sharp corner at point 922.Therefore, processor 704 may generate a spline interpolation 904 usingthe y-coordinates of the points 921, 922, 923 and 924 as knots. Thisresults in a smooth transition between the facial expressions. Whilecontrol point 903 moves only in y-direction in this example, controlpoints are generally allowed to move in both dimensions. Therefore, thespline curve 904 may be a two-dimensional spline approximation of theknots to allow the processor 705 to interpolate both x- andy-coordinates.

The processor 705 determines 830 a transformation that transforms theestimated reference facial feature of the character into anapproximation or representation of the estimated user facial feature ofthe user. As described above, the transformation can be a transformationmatrix that transforms the control points of the estimated referencefacial feature of the character into an approximation or representationof the control points of the estimated user facial feature of the user.

If the output frame is between the first key frame and the second keyframe, for example, time instant “C” between time instants “A”, “B”, theprocessor 705 determines a further reference facial image of thecharacter by performing an interpolation operation based on the firstreference facial image and the second reference facial image of thecharacter, referred to as a third reference facial image. The thirdreference facial image is associated with the estimated reference facialfeature of the character.

On the other hand, if the output frame is outside the first key frameand the second key frame, the processor 705 determines the thirdreference facial image of the character by performing an extrapolationoperation based on the first reference facial image and the secondfacial image.

The processor 705 modifies 840, based on the transformation, the thirdreference facial image of the character by for example changing thepositions of pixels in the third reference facial image. Since theestimated reference facial feature of the character may represent aspline curve, referred to as a first spline curve, and the estimateduser facial feature of the user may represent another spline curve,referred to as a second spline curve, modifying the third referencefacial image of the character also results in transforming the firstspline curve into an approximation or representation of the secondspline curve.

The processor 705 repeats the above steps for each of the characters inthe first key frame and the second key frame, and creates 850 the outputframe for the characters in the video based on the modified thirdreference facial images for those characters. For example, the processor705 may create the output frame by combining the modified thirdreference facial images into the output frame.

Once the output frame is created, processor 750 may apply a perspectivetransformation on the output frame. Since the output movie is ultimatelydisplayed on the 2D device, processor 750 applies the transformation on2D coordinates of control points to create the impression of a 3Drotation. FIG. 10A shows a transformation of the 2D coordinates of thecontrol points to create the impression of a 3D rotation of thecharacter's face. The degree of rotation may be known from the storylineand therefore, processor 750 calculates a transformation that createsthe corresponding impression. This transformation may also be integratedinto the previous transformation applied to the reference image.Processor 750 may also create the impression of perspective bydown-scaling points that are further away from the virtual camera.

FIG. 10B illustrates a simplified 3D model of a character's head. This3D model may be created by a designer or developer once for eachcharacter. Based on the 3D model, processor 750 can calculate whichcontrol points are not visible because they are occluded by other partsof the head. In the example of FIG. 10B, the right eye is occluded andnot visible. Applying this calculation to the output image to hide theparts of the images that are not visible according to the 3D model,increases the realistic impression of the created video. The calculationmay be based on an assumed pivot point, that may be the top of the neck.The processor 750 can then perform the transformation based on rotationand tilt around the pivot point.

Both processes in FIGS. 10(a) and 10(b) may be performed on the controlpoints only. The reference image can then be transformed as describedabove, which creates the impression of a 3D rotation of the referenceimage at the same time as making the reference image similar to theuser's face geometry.

It should be understood that the example methods of the presentdisclosure might be implemented using a variety of technologies. Forexample, the methods described herein may be implemented by a series ofcomputer executable instructions residing on a suitable computerreadable medium. Suitable computer readable media may include volatile(e.g. RAM) and/or non-volatile (e.g. ROM, disk) memory, carrier wavesand transmission media. Exemplary carrier waves may take the form ofelectrical, electromagnetic or optical signals conveying digital datasteams along a local network or a publically accessible network such asinternet.

It should also be understood that, unless specifically stated otherwiseas apparent from the following discussion, it is appreciated thatthroughout the description, discussions utilizing terms such as“determining”, “obtaining”, or “receiving” or “sending” or“authenticating” or the like, refer to the action and processes of acomputer system, or similar electronic computing device, that processesand transforms data represented as physical (electronic) quantitieswithin the computer system's registers and memories into other datasimilarly represented as physical quantities within the computer systemmemories or registers or other such information storage, transmission ordisplay devices.

1. A method for creating a video on a mobile device that comprises acamera, the method comprising: creating a graphic user interface on themobile device to capture by the camera multiple photographic facialimages of a user for respective multiple facial expressions of acharacter in the video; using the multiple photographic facial images tomodify stored character images by: matching facial features of thecharacter, comprising control points, to facial features of the user forthe multiple facial expressions of the character in the video, andinterpolating between the control points associated with a first facialexpression of the multiple facial expressions and control pointsassociated with a second facial expression of the multiple facialexpressions; and creating the video based on the modified characterimages.
 2. A method for creating a video including a character on amobile device that comprises a camera, the method comprising: (a)creating a graphic user interface on the mobile device to capture by thecamera multiple photographic facial images of a user for respectivemultiple facial expressions; (b) extracting a user facial feature,comprising control points, from each of the multiple photographic facialimages; (c) storing associated with a respective facial expressionidentifier the user facial feature from each of the multiplephotographic facial images; (d) selecting one of the multiple userfacial features based on a first facial expression identifier associatedwith a first frame of the video; (e) determining a transformation thattransforms a reference facial feature, comprising control points,associated with the first facial expression identifier into anapproximation or representation of the selected one of the multiple userfacial features; (f) modifying, based on the transformation, a referencefacial image of the character associated with the first facialexpression identifier and the reference facial feature; and (g) creatingthe first frame of the video based on the modified reference facialimage; for a second facial expression identifier associated with asecond frame of the video, repeating steps (d) to (g) to create thesecond frame of the video; and interpolating between control points ofthe reference facial feature associated with the first frame of thevideo and control points of the reference facial feature associated withthe second frame of the video to create an output frame between thefirst frame and the second frame.
 5. The method of any one of precedingclaims, wherein the graphic user interface comprises the referencefacial image of the character.
 6. The method of any one of precedingclaims, wherein the graphic user interface comprises a live view of eachof the multiple photographic facial images.
 7. The method of claim 6,wherein the live view is positioned next to the camera.
 8. The method ofclaim 7, wherein the live view is positioned next to the referencefacial image of the character.
 9. The method of claim 6, furthercomprising superimposing the live view on the reference facial image ofthe character.
 10. The method of any one of preceding claims, furthercomprising selecting the character from a plurality of characters in thevideo.
 11. The method of any one of preceding claims, further comprisingrecording audio data associated with the user facial feature.
 12. Acomputer software product, including machine-readable instructions, whenexecuted by a processor of a mobile device, causes the processor toperform any one of preceding methods.
 13. A mobile device for creating avideo including a character, the mobile device comprising a camera; adisplay; and a processor, the processor configured to (a) create agraphic user interface on the display of the mobile device to capture bythe camera multiple photographic facial images of a user for respectivemultiple facial expressions; (b) extract a user facial feature,comprising control points, from each of the multiple photographic facialimages; (c) store associated with a respective facial expressionidentifier the user facial feature from each of the multiplephotographic facial images; (d) select one of the multiple user facialfeatures based on a first facial expression identifier associated with afirst frame of the video; (e) determine a transformation that transformsa reference facial feature, comprising control points, associated withthe first facial expression identifier into an approximation orrepresentation of the selected one of the multiple user facial features;(f) modify, based on the transformation, a reference facial image of thecharacter associated with the first facial expression identifier and thereference facial feature; (g) create the first frame of the video basedon the modified reference facial image; and (h) present, on the display,the first frame of the video in the graphic user interface. for a secondfacial expression identifier associated with a second frame of thevideo, repeat steps (d) to (g) to create the second frame of the video;and interpolate between control points of the reference facial featureassociated with the first frame of the video and control points of thereference facial feature associated with the second frame of the videoto create an output frame between the first frame and the second frame.